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Abstract 

We  are  developing  techniques  to  forecast  terrorist  events  and  effec¬ 
tive  ways  to  present  these  forecasts  to  intelligence  analysts.  Fore¬ 
casts  come  from  analyzing  historical  event  data  and  geographical 
information.  We  explore  feature  reduction  techniques  to  make  the 
computations  closer  to  real-time  and  techniques  for  representing  the 
confidence  (or  uncertainty)  of  the  data. 

CR  Categories:  G.3  [Mathematics  of  Computing]:  Probability 
and  Statistics — Probabilistic  Algorithms;  H.5.0  [Information  Sys¬ 
tems]:  Information  Interfaces  and  Presentation — General; 
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1  Introduction 

Having  the  ability  to  forecast  terrorist  events  is  of  utmost  im¬ 
portance  to  intelligence  analysts  and  military  planners  performing 
counter-measures  for  the  global  war  on  terror.  We  are  currently 
developing  techniques  to  forecast  the  likeliest  locations  a  terror¬ 
ist  would  target.  We  are  extending  earlier  work  [1]  that  utilizes 
historical  event  and  geographic  information  system  (GIS)  infor¬ 
mation  data  to  generate  geospatial  likelihood  functions  indicating 
where  an  attack  may  occur  next.  Part  of  our  effort  is  focused  on 
the  computationally-intensive  problem  of  reducing  the  search  space 
produced  by  the  large  amounts  of  GIS  and  event  data.  We  also  ex¬ 
plore  how  to  represent  the  confidence  of  the  data  by  assessing  and 
characterizing  the  types  of  uncertainty  and  developing  effective  pre¬ 
sentation  approaches.  We  consider  the  impact  of  having  error  in 
the  historical  event  and  feature  data,  choice  of  feature  reduction 
method,  and  choice  of  likelihood  function. 

We  describe  briefly  our  progress  in  developing  techniques  for 
feature  reduction,  event  forecasts,  and  associated  display  tech¬ 
niques.  We  also  highlight  our  current  plans  to  include  confidence 
(uncertainty)  information  into  the  forecast  visualizations. 

2  Feature  Reduction 

One  of  the  challenges  of  working  with  comprehensive  GIS  layers  is 
the  vast  number  of  features  available  for  consideration.  Our  data 
ranges  from  just  a  few  embassy  locations  to  thousands  of  street 
junctions.  Because  the  events,  usually  bombings,  are  scattered 
across  the  area  of  interest,  it  is  not  immediately  apparent  which 
features  are  significant.  The  benefit  of  feature  reduction  is  not  just 
to  eliminate  extraneous  and  possibly  misleading  pieces  of  informa¬ 
tion,  but  to  also  improve  computational  memory  and  time  require¬ 
ments. 

The  simplest  methods  are  to  limit  the  number  of  features  to 
consider  based  on  certain  metrics  (such  as  a  maximum  distance 
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from  the  event)  or  constraining  each  feature  to  lie  within  a  regional 
bounding  box.  Both  methods  assume  terrorists  prefer  certain  spa¬ 
tial  features  (consciously  or  not),  such  as  buildings  or  streets  near 
the  target  location.  The  initial  results  are  promising,  although  we 
do  not  have  a  clear  understanding  of  the  distance  at  which  features 
remain  viable  for  target  selection. 

Several  numerical  approaches  are  being  reviewed  such  as  prin¬ 
cipal  components,  clustering,  and  factor  analysis.  Currently  we  are 
working  with  the  Gini  index  [2].  The  purpose  of  this  method  is  to 
provide  a  ranking  for  each  feature  based  on  inter-event  distances. 
Each  event  is  represented  as  a  vector  of  spatial  distances  from  its 
location  to  the  location  of  each  feature  in  the  layers.  If  dij  is  the  nu¬ 
meric  distance  between  events  i  and  j  for  the  same  vector  element 
then  its  similarity  sijis  calculated  as: 

1 

j  1  T-  adij 

where  a  =  1  Id  and  d  is  the  average  numeric  inter-event  distance. 
Note  that  spatial  distance  refers  to  the  distance  between  an  event 
location,  while  numeric  distance  refers  to  the  ’’distance”  between 
two  vector  elements  as  a  distance  between  distances. 

The  Gini  index  between  two  events  is  defined  as 
gij  =  4s  ij  ( 1  —  si  j). 

For  the  entire  set  of  events  the  averaged  Gini  index 
_  2E£1‘E5=mgy 
g  n(n—  1) 

is  a  suitable  measure  of  cohesiveness.  A  lower  value  of  Ig  is  con¬ 
sidered  to  denote  similarity  within  the  feature  space.  A  further  ex¬ 
tension  of  the  method  adjusts  this  value  based  on  its  disparity  from 
the  background  distribution  for  this  feature.  For  example,  if  every 
event  occurs  within  50  m  of  an  ATM,  but  every  ATM  location  in 
the  area  is  within  50  m,  then  the  distance  to  the  ATMs  is  not  a  very 
useful  measurement.  The  number  of  features  can  be  reduced  by  es¬ 
tablishing  a  cutoff  threshold  for  the  Gini  index.  An  example  of  this 
reduction  is  provided  in  Figure  1 . 

3  Event  Forecasting 

The  problem  of  determining  spatial  preferences  has  been  success¬ 
fully  applied  to  urban  settings  to  find  potential  crime  hot  spots  by 
looking  at  factors  such  as  economics,  populations,  proximity  of  key 
building  types,  and  past  criminal  histories.  Brown  et  al.  [1]  applied 
the  same  technique  to  look  at  terrorist  event  preferences.  We  are  us¬ 
ing  this  work  as  a  roadmap  for  our  efforts.  Our  goals  are  to  develop 
forecast  image  overlays  for  regional  maps  of  the  target  locations 
that  predict  the  likeliest  locations  of  terrorist  events.  The  combined 
map  and  overlay  will  aid  security  operations  in  determining  the  best 
places  to  deploy  security  forces  or  sensing  equipment. 

We  employ  a  few  different  methods  to  generate  forecast  overlays 
for  the  geographic  region  of  interest.  One  of  these  methods  is  the 
Gaussian-based  forecasting  technique  derived  in  [1].  The  premise 
of  the  technique  is  that  a  suicide  bomber  is  directed  toward  a  cer¬ 
tain  location  by  a  set  of  qualities  such  as  geospatial  features,  demo¬ 
graphic  information,  and  recent  political  events.  Focusing  on  the 
geospatial  domain,  we  consider  the  intended  target  was  associated 
with  the  features  located  within  a  small  distance  from  the  event  lo¬ 
cation.  Furthermore,  we  consider  the  distance  between  key  features 


Figure  1:  (Left)  Before  and  after  the  Gini  index  feature  reduction  technique  is  applied  to  embassies  and  gas  station  features.  (Right)  Likelihood 
of  terrorist  attacks  using  (1)  GIS  information  about  locations  of  embassies  and  gas  stations,  and  (2)  historical  terrorist  events  between  2001-2004. 
Forecast  layer  generated  by  our  testbed,  converted  to  KML,  and  loaded  into  Google  Earth. 


and  the  event  location  as  the  highest  likelihood,  and  taper  the  like¬ 
lihood  values  as  the  distances  increase  or  decrease  away.  We  model 
this  effect  using  a  Gaussian  distribution  centered  at  the  distance  be¬ 
tween  key  features  and  the  event.  The  probability  density  function 
(PDF)  for  a  feature  i  for  a  given  grid  cell  g  is  given  by 
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where  Dig  is  the  distance  from  the  feature  to  the  grid  cell,  is 
the  distance  from  the  feature  to  event  location  n ,  and  N  is  the  total 
number  of  events. 

The  joint  density  for  the  entire  feature  set  is  established  by  the 
product  of  each  density  result  as 
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where  I  is  the  total  number  of  features  and  c  is  a  constant. 

Another  method  implemented  and  explored  was  a  k-nearest 
neighbors  approach:  take  the  set  of  grid  vectors  for  each  feature, 
calculate  k  minimum  distances  to  the  event  vectors,  and  keep  the 
median.  The  reasoning  is  that  if  a  terrorist  has  a  vector  of  prefer¬ 
ence  to  a  certain  geospatial  arrangement  then  any  grid  cell  that  is 
similar  to  it  should  be  a  likely  candidate  for  a  terrorist  attack. 


4  Confidence  Modeling 

One  of  the  most  important  aspects  of  forecasting  is  having  an  esti¬ 
mate  of  the  confidence  in  the  supporting  numerical  values.  In  nu¬ 
merical  weather  prediction,  there  is  always  some  value  of  certainty 
associated  with  the  forecasts.  One  example  is  a  prediction  of  80% 
chance  of  rain,  which  implies  that  the  numerical  weather  modeler(s) 
predicted  that  8  in  10  times  it  would  rain  tomorrow.  Having  confi¬ 
dence  (or  uncertainty)  associated  with  the  terrorist  event  forecasts 
is  very  important.  We  have  identified  several  sources  of  uncertainty 
that  must  be  modeled  for  each  event  forecast.  We  feel  the  most  im¬ 
portant  sources  are:  positional  uncertainty  associated  with  the  GIS 
and  event  data,  error  associated  with  the  feature  reduction,  and  er¬ 
ror  in  the  choice  of  event  prediction  technique  (i.e.,  error  associated 
with  generating  the  likelihood  functions).  For  now,  we  are  only 
beginning  to  model  the  positional  error  of  the  event  locations,  for 
which  we  briefly  describe  our  approach  and  show  a  mock-up  of  the 
resulting  visualization  technique  we  plan  to  use. 

Each  historical  event  record  contains  the  date,  location,  type  of 
attack,  organization  who  claimed  responsibility,  a  description  of 
what  happened,  and  confidence  of  the  recorded  data.  The  confi¬ 
dence  values  for  the  location  are  rated  from  1  to  5,  with  error  val¬ 
ues  starting  at  ±1  m  and  increasing  by  a  power  of  10  for  each  rank. 
The  values  represent  the  uncertainty  about  the  exact  location  of  a 


detonation  as  the  analysts  try  to  extract  the  information  from  news 
sources.  This  location  uncertainty  impacts  the  computation  of  the 
distances  computed  from  each  event  to  the  nearby  features  (e.g., 
building,  street  intersections,  subway  stations,  etc.).  The  distances 
become  a  range  of  values  D(n  ±E(r ),  where  r  is  the  rating  index 
and  E(r)  is  the  error  value.  Accounting  for  this  variation,  a  range 
of  PDFs  result  for  each  event  location  used  in  the  computations. 

We  plan  to  start  by  using  the  distances  associated  with  the  max¬ 
ima  and  median  of  the  range  of  error  (3  distances),  producing  3 n 
times  as  many  PDFs.  The  first  visualizations  will  likely  use  an  in¬ 
terface  slider  to  page  through  the  resulting  PDFs.  The  second  visu¬ 
alization  will  aggregate  the  highest  risk  locations  fusing  them  into 
one  image.  A  third  approach  will  use  the  median  values  to  generate 
the  main  PDF,  and  then  use  elevation  to  show  the  error  (or  range  of 
values)  associated  with  the  minima  and  maxima. 


5  Results  and  Conclusions 


We  have  developed  a  software  testbed  for  the  algorithms  us¬ 
ing  Trolltech’s  Qt  library  ( www.trolltech.com )  combined  with 
OpenGL.  We  also  generate  forecast  layers  in  the  KML  syntax 
(< earth.google.com/kml )  and  display  them  using  Google  Earth  (ex¬ 
ample  shown  in  Figure  1  (right)).  The  testbed  supports  GIS  and 
historical  event  database  formats:  Microsoft  Access  and  ESRI 
(www.esri.com). 

To  conclude,  we  have  explored  several  techniques  for  performing 
feature  reductions,  developing  forecasts,  and  proposed  several  tech¬ 
niques  for  incorporating  confidence  information  into  the  visualiza¬ 
tions  of  the  forecasts.  Our  efforts  are  ongoing  and  include  plans  to 
explore  other  feature  reduction  methods  (e.g.,  parallel  algorithms), 
other  likelihood  functions  (e.g.,  likelihood  ratios  involving  one  fac¬ 
tion  not  being  involved, 

P(Af) 


LR(x\f)  = 


P(Ag) ' 


where  g  represents  a  faction  known  not  to  be  involved  with  the 
event,  /  represents  all  the  factions),  and  methods  for  representing 
confidence  (or  uncertainty).  We  are  also  beginning  to  explore  the 
appropriateness  of  using  Bayesian  analysis  with  Gibbs  Sampling  as 
a  tool  in  our  approach. 
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